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Abstract 



Standard parton distribution function sets do not have rigorously 
quantified uncertainties. In recent years it has become apparent that 
these uncertainties play an important role in the interpretation of 
hadron collider data. In this paper, using the framework of statisti- 
cal inference, we illustrate a technique that can be used to efficiently 
propagate the uncertainties to new observables, assess the compati- 
bility of new data with an initial fit, and, in case the compatibility is 
good, include the new data in the fit. 



1 Introduction 



Current standard sets of Parton Distribution Function (PDF) do not include 
uncertainties fl|]. In practice, as long as the PDF's are used to calculate 
observables that themselves have large experimental uncertainties this short- 
coming is obviously not a problem. In the past the precision of the hadron 
collider data was such that there was no ostensible need for the PDF un- 
certainties, as was testified by the good agreement between the theory and 
measurements. However, the need for PDF uncertainties became apparent 
with the measurement of the one jet inclusive transverse energy at the Teva- 
tron M. At large transverse jet energies the data was significantly above the 
theoretical prediction, a possible signal for new physics. The deviation was 
ultimately "fixed" by changing the PDF's in such a manner that they still 
were consistent with the observables used to determine the PDF 0. This is 
a reflection of the significant PDF uncertainties for this observable. Knowing 
the uncertainties on the PDF's would have cleared the situation immediately. 
Note that once the data is used in the PDF fit, it can not be used for other 
purposes. Specifically, setting limits on possible physics beyond the Standard 
Model. In that case, one should fit the PDF's and the new physics simulta- 
neously. The technique presented in this paper is well suited for this sort of 
problem. 

The spread between different sets of PDF's is often associated with PDF 
uncertainties. Currently, this is what is used for the determination of the 
PDF uncertainty on the H^-boson mass at the Tevatron. It is not possible to 
argue that this spread is an accurate representation of all experimental and 
theoretical PDF uncertainties. For the next planned high luminosity run at 
Fermilab, assuming an integrated luminosity of 2 the expected 40 MeV 
uncertainty on the H^-boson mass is dominated by a 30 MeV production 
model uncertainty. The latter uncertainty itself is dominated by the PDF 
uncertainty, estimated to be 25 MeV H. This determination of the PDF 
uncertainty is currently nothing more than an educated guess. It is made 
by ruling out existing PDF's using the lepton charge asymmetry in H^-boson 
decay events. The spread of the remaining PDF's determines the uncertainty 
on the extracted iy-boson mass. Because the PDF uncertainty seems to be 
the dominant source of uncertainty in the determination of the iy-boson 
mass, such a procedure must be replaced by a more rigorous quantitative 
approach. The method described in this paper is well suited for this purpose. 
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In this paper, using the framework of statistical inference H Q , we illus- 
trate a method that can be used for many purposes. First of all, it is easy 
to propagate the PDF uncertainties to a new observable without the need to 
calculate the derivative of the observable with respect to the different PDF 
parameters. Secondly, it is straightforward to assess the compatibility of new 
data with the current fit and determine whether the new data should be in- 
cluded in the fit. Finally, the new data can be included in the fit without 
redoing the whole fit. 

This method is significantly different from more traditional approaches to 
fit the PDF's to the data. It is very flexible and beside solving the problems 
already mentioned, it offers additional advantages. First, the experimental 
uncertainties and the probability density distributions for the fitted parame- 
ters do not have to be Gaussian distributed. However, such a generalization 
would require a significant increase in computer resources. Second, once a fit 
has been made to all the data sets, a specific data set can be easily excluded 
from the fit. Such an option is important in order to be able to investigate 
the effect of the different data sets. This is particularly useful in the case of 
incompatible new data. In that case one can easily investigate the origin of 
the incompatibility. Finally, because it is not necessary to redo a global fit 
in order to include a new data set, experimenters can include their own new 
data into the PDF's during the analysis phase. 

The outline for the rest of the paper is as follows. In Sec. we describe the 
inference method. The flexibility and simplicity of the method is illustrated 
in Sec. [3], by applying it to the CDF one jet inclusive transverse jet energy 
distribution || and the CDF lepton charge asymmetry data f7|. In Sec. § we 
draw our conclusions and outline future improvements and extensions to our 
method. 

2 The Method of Inference 

Statistical inference requires an initial probability density distribution for 
the PDF parameters. This initial distribution can be rather arbitrary, in 
particular it can be solely based on theoretical considerations. Once enough 
experimental data are used to constrain the probability density distribution of 
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the parameters the initial choices become irrelevant []. Obviously, the initial 
choice does play a role at intermediate stages. The initial distribution can 
also be the result of a former fit to other data. The data that we will use later 
in this paper do not constrain the PDF's enough by themselves to consider 
using an initial distribution based only on theory. The final answer would 
depend too much on our initial guess. We therefore decided to use the results 
of Ref . || . In this work the probability density distribution was assumed to 
be Gaussian distributed and was constrained using Deep Inelastic Scattering 
(DIS) data. All the experimental uncertainties, including correlations, were 
included in the fit, but no theoretical uncertainties were considered. The 
fact that no Tevatron data were used allows us to illustrate the method with 
Tevatron data We briefly summarize Ref. || in the appendix. 

In Sec. 2.1 we explain the propagation of the uncertainty to new observ- 
ables. Sec. 2.2 shows how the compatibility of new data with the PDF can be 
estimated. Finally, in Sec. 2.3 we demonstrate how the effect of new data can 
be included in the PDF's by updating the probability density distribution of 
the PDF parameters. 

2.1 Propagation of the uncertainty 

We now assume that the PDF's are parametrized at an initial factorization 
scale Qo, with N par parameters, {A} = Ai, A2, . • . , Aj\r r and that the prob- 
ability density distribution is given by -Pjmt(A). Note that P» n »t(A) does not 
have to be a Gaussian distribution. 

By definition Pi n #(A) is normalized to unity, 

/ P mit {\)d\ = 1 , (1) 

where the integration is performed over the full multi-dimensional parameter 
space and dX = rii=T" d\. To calculate the parameter space integrals we 
use a Monte-Carlo (MC) integration approach with importance sampling. 
We generate N pd j random sets of parameters {A} distributed according to 
-Pinit(A). This choice should minimize the MC uncertainty for most of the 

1 The standard PDF sets of Ref. basically assume that the initial probability density 
distribution for the parameters is uniform. 

2 Recent PDF sets have also included the Tevatron data that we will use, but none of 
these sets included uncertainties. 
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integrals we are interested in. For reference we also generate one set at 
the central values of the {A}, the //{a}- The number of parameter sets to 
be used depends on the quality of the data. The smaller the experimental 
uncertainty is compared to the PDF uncertainty, the more PDF's we need. 
We must ensure a sufficient fraction of PDF's span the region of interest (i.e. 
close to the data). For the purposes of this paper, we found that N pd f = 100 
is adequate. Clearly, to each of the N pd f sets of parameters {A} correspond 
a set of unique PDF's. Each of these PDF sets have to be evolved using the 
Altarelli-Parisi evolution equations. We used the CTEQ package to do this 
evolution ||. 

We now can evaluate any integral I over the parameter space as a finite 
sum 

' f(X)P im t(X)d\ 



v 



pdf 



E /(*) (2) 



Npdf j=1 
= </>, 

with X 3 is the j-th random set of {A}. The function / represents an integrable 
function of the PDF parameters. The uncertainty on the integral I due to 
the MC integration is given by, 
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(P) - (f) 2 ^ 



pdf 



For any quantity, x(A), that depends on the PDF parameters {A} (for 
example an observable, one of the flavor PDF's or for that matter one of the 
parameter itself), the theory prediction is given by its average value, fi x , and 
its uncertainty, a x Q 

r 1 Npdf / \ 

fjt x = / x(\)P inU (\)d\ w — - £ x (\A 

Jv rSpdf j=1 



[ (x(A) - fi x ) 2 P mit (X)d\ « — 53 M - fiS ■ (4) 

Jv rs P df j=1 K K ' 



3 If the uncertainty distribution is not Gaussian the average and the standard deviation 
might not properly quantify the distribution. 
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Note that fi x is not necessarily equal to the value of x(X) evaluated at the 
central value of the {A}. However, this is how observables are evaluated if 
one has only access to PDF's without uncertainties. 

Given y(X), another quantity calculable from the {A}, the covariance of 
x(X) and y(X) is given by the usual expression: 



C xy = / (x(X) - fi x ) (y(X) - fiy) P init (X)dX 



I N pdf 



E ( x M - M (y (X) - M ■ (5) 



N p df j=1 

The correlation between x(X) and y(X) is given by cor xy = C xy / \a x a y ). For 
example, this can be used to calculate the correlation between two experi- 
mental observables, between an observable and one of the PDF parameters, 
or between an observable and a specific flavor PDF at a fixed Bjorken-x. 
Using Eq. |3], the MC uncertainty on the average and (co)variance is given 

by 

8fJL x 

= <Z\llZ- ( 6 ) 

5C X y 

The MC technique presented in this sub-section, gives a simple way to 
propagate uncertainties to a new observable, without the need for calculating 
the derivatives of the observable with respect to the parameters. 




2.2 Compatibility of New Data 

We will assume that one or several new experiments, not used in the deter- 
mination of the initial probability density distribution, have measured a set 
of N b s observables {x e } — x\ , x\, . . . , x e Nob3 . The experimental uncertainties, 
including the systematic uncertainties, are summarized by the N Q b s x N b s 
experimental covariance matrix C exp . Note that the correlations between ex- 
periments are easily incorporated. Here however, we have to assume that the 
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new experiments are not correlated with any of the experiments used in the 
determination of Pmit- The probability density distribution of {x e } is given 
by 



P(x e \X)P init (X)dX 



(7) 



1 



E pW) > 



where P(x e \X) is the conditional probability density distribution (often re- 
ferred to as likelihood function). This distribution quantifies the probability 
of measuring the specific set of experimental values {x e } given the set of 
PDF parameters {A}. In PDF sets without uncertainties, Pi n u(X) is a delta 
function and P(x e ) = P(x e \X). 

Instead of dealing with the probability density distribution of Eq. [7], one 
often quotes the confidence level to determine the agreement between the 
data and the model. The confidence level is defined as the probability that 
a repeat of the given experiment (s) would observe a worse agreement with 
the model. The confidence level of {x e } is given by 



where CL(x e |A) is the confidence level of {x e } given {A}. If CL(x e ) is larger 
than an agreed value, the data are considered consistent with the PDF and 
can be included in the fit. If it is smaller, the data are inconsistent and we 
have to determine the source of discrepancy. 

For non-Gaussian uncertainties the calculation of the confidence level 
might be ambiguous. In this paper we assume that the uncertainties are 
Gaussian. The conditional probability density distribution and confidence 
level are then given by 
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E CL(x e |A) , 




(A) 




(10) 
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where 



(11) 



A-./ 



is the chi-squared of the new data. The theory prediction for the k-th ex- 
perimental observable, a?jj.(A), is calculated using the PDF set given by the 
parameters {A}. The matrix M tot is the inverse of the total covariance ma- 
trix, C tot , which in turn is given by the sum of the experimental, C exp , and 
theoretical, C theor , covariance matrix. We assume that there is no corre- 
lation between the experimental and theoretical uncertainties. We will use 
a minimal value of 0.27% on the confidence level, corresponding to a three 
sigma deviation, as a measure of compatibility of the data with the theory. 
If the new data are consistent with the theory prediction then the maximum 
of the distribution of the Xnew should be close to N obs (within the expected 
\?2N ohs uncertainty). The standard deviation of Xnewi a x 2 > ^ e ^ s us some_ 
thing about the relative size of the PDF uncertainty compared to the size of 
the data uncertainty. The larger the value of cr x 2 ew is compared to y/2No^, 
the more the data will be useful in constraining the PDF's. 

Note that if there are several uncorrelated experiments, the total Xnew * s 
equal to the sum of the Xnew °f the individual experiments and the conditional 
probability is equal to the product of the individual conditional probabilities. 



Once we have decided that the new data are compatible with the initial 
PDF's, we can constrain the PDF's further. We do this within the formalism 
of statistical inference, using Bayes theorem. The idea is to update the 
probability density distribution taking into account the new data. This new 
probability density distribution is in fact the conditional probability density 
distribution for the {A} considering the new data {x e } and is given directly 
by Bayes theorem 



where P(x e ), defined in Eq. [7|, acts as a normalization factor such that 
P(A|x e ) is normalized to one. Because -P ne «j(A) is normalized to unity, we 



2.3 Effect of new data on the PDF's 



PneM = P(M* e ) 



P(x e |A) P ina (X) 
P(x e ) 



(12) 
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can replace P(x e |A) in Eq. [12| simply by e~^% . This factor acts as a new 
weight on each of the PDF's. 

We can now replace Pi n it{^) by Pnew(^) in the expression for the average, 
standard deviation and covariance given in Sec. 2.1 and obtain predictions 
that include the effect of the new data. With the MC integration technique 
described before, these quantities can be estimated by weighted sums over 
the N pdf PDF sets 

N„ df 



~ W k X (\ {k) ) 

k=l 

Npdf , x 2 

£ w k (x(A«) - /i x ) (13) 



k=l 

2V, 



"pdf 

C xy « ^^(^-^(y^ 



k=l 

where the weights are given by 



1 ,,2 



(A fc ) 



Note that for the calculation of the Monte-Carlo uncertainty of the weighted 
sums, the correlation between the numerator and denominator in Eq. |H| has 
to be taken into account properly. 

Our strategy is very flexible. Once the theory predictions x*(A) using the 
Npdf PDF sets are known for each of the experiments , it is trivial to include 
or exclude the effect of one of the experiments on the probability density 
distribution. If the different experiments are uncorrelated then all what is 
needed is the Xnew °f eac h individual experiments for all the PDF sets. In 
that case, each experiment is compressed into N pd f xl,ew values- 
One other advantage is that all the needed x*(A) can be calculated be- 
forehand in a systematic manner, whereas standard chi-squared or maximum 
likelihood fits require many evaluations of x k (X) during the fit as the param- 
eters are changed in order to find the extremum. These methods are not 
very flexible, as a new fit is required each time an experiment is added or 
removed. 
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The new probability density distribution of the PDF parameters is Gaus- 
sian if the following three conditions are met. First, the initial probability 
density distribution, P init (\), must be Gaussian. Second, all the uncertainties 
on the data points must be Gaussian distributed (that includes systematic 
and theoretical uncertainties). Finally, the theory predictions, xj(X), must 
be linear in {A} in the region of interest. This last requirement is fulfilled 
once the PDF uncertainties are small enough. For the studies in this paper 
all three requirements are fulfilled. The new probability density distribution 
can therefore be characterized by the average value of the parameters and 
their covariance matrix, which can be calculated, together with their MC in- 
tegration uncertainty, using Eq. [13|. Once the new values of the average and 
the covariance matrix have been calculated, a new set of PDF parameters 
can be generated according to the new distribution and used to make further 
prediction instead of using the initial set of PDF with the weights. 

An alternative way to generate a PDF set distributed according to P neu ,(A) 
is to unweight the now weighted initial PDF set. The simplest way to un- 
weight the PDF sets is to use a rejection algorithm. That is, define w max as 
the largest of the N p df weights given in Eq. |14]. Next generate for each PDF 
set a uniform stochastic number, r^, between zero and one. If the weight Wk 
is larger or equal to x w max we keep PDF set k, otherwise it is discarded. 
The surviving PDF's are now distributed according to P new (X). The number 
of surviving PDF's is on average given by N™^" = l/w max . We can now ap- 
ply all the techniques of the previous sub-sections, using the new unweighted 
PDF set. The MC integration uncertainties are easily estimated using the 
expected number of surviving PDF's. In the extreme case that w max is close 
to one and only a few PDF survive the unweighting procedure, the number 
of initial PDF's must be increased. The other extreme occurs when all the 
weights are approximately equal, i.e. Wk ~ 1/N p df. In that case the new data 
puts hardly any additional constraints on the PDF. 

The xiew * s OIU y use d to calculate the weight of a particular PDF, so 
that the new probability density distribution of the PDF parameters can be 
determined. We do not perform a chi-squared fit. However, if the new prob- 
ability density distribution of the parameters is Gaussian distributed then 
our method is equivalent to a chi-squared fit. In that case the average value 
of the parameters correspond to the maximum of the probability density 
distribution. The minimum chi-squared can be estimated (with MC uncer- 
tainties) from the average Xnew calculated with the new probability density 
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distribution. Indeed, by definition this average must be equal to the mini- 
mum chi-squared, Xmim P^ us the known number of parameter. Note that the 
variance of the Xnew mu st itself be equal to twice the number of parameters. 
To obtain the overall minimum chi-squared, the value of the minimum chi- 
squared of the initial fit must be added to Xmin- As l° n S as the confidence 
level of the new data that were included in the fit is sufficiently high, the 
overall minimum chi-squared obtained is guaranteed to be in accordance with 
expectations Q. 



3 Expanding the PDF sets 

The viability of the method described in Sec. 2 is studied using two CDF 
measurements. In Sec. 3.1 the one jet inclusive transverse energy distribution 
is considered, while the lepton charge asymmetry in W^-boson decay is ex- 
amined in Sec. 3.2. The statistical, systematic, and theoretical uncertainties 
on the observables will be taken into account. 



3.1 The one jet inclusive measurement 

The CDF results on the one jet inclusive transverse energy distribution [0 
demonstrated the weakness of the current standard PDF sets due to the 
absence of uncertainties on the PDF parameters. 

The observables are the inclusive jet cross section at different transverse 
energies |, E l T 

Xl = 4^t(E 1 t) • (15) 



dE 



T 



We first have to construct the experimental covariance matrix, C^ p , using the 
information contained in Ref. ||. The paper lists the statistical uncertainty 
at the different experimental points, Aq(Et), together with eight indepen- 
dent sources of systematic uncertainties, Ak(E^). Hence, the experimental 



4 We are assuming that the initial Xmin was within expectations. 

5 To be more precise, the inclusive jet cross section in different bins of transverse energy. 
In the numerical results presented here we take the finite binning effects into account. 
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measurements, x\ are given by 



A = 4(A) + ^A (^) + ]T % A fc (^) , (16) 

k=l 

where as before, x\{\) is the theoretical prediction for the observable cal- 
culated with the set of parameters {A}. The rj l and r/k are independent 
random variables normally distributed with zero average and unit standard 
deviation. Note that some of the systematic uncertainties given in Ref. |§ 
are asymmetric. In those cases we symmetrized the uncertainty using the 
average deviation from zero. From Eq. |16| we can construct the experimental 
covariance matrix 

C?* p = (Ao(J5£))\ + ^A fe (4)4(^) . (17) 

k=l 

We also need to estimate the theoretical uncertainty. In Eq. |16| no the- 
oretical uncertainties were taken into account. We consider two types of 
uncertainties: the uncertainty due to the numerical Monte Carlo integration 
over the final state particle phase space, Amg(Et), an d the renormaliza- 
tion/factorization scale, /z, uncertainty, A^{E\). The theoretical prediction 
in Eq. [16| must then be replaced by 

da NLO 

4(A) - -J^r( E T, A, a*) +rf MC A MC {E* T ) + tyA M (^) , (18) 
from which we can derive the theoretical covariance matrix 

C?™ = (A MC (E* T )) 2 Sij + A M (^)A M (4) . (19) 

Here we assume that there is no bin to bin correlation in the MC uncertainty. 
On the other hand, we take the correlation of the scale uncertainty fully into 
account. Both Amc and A M are evaluated at the central values of the PDF 
parameters, assuming that the variation is small. 

We evaluate the scale uncertainty in a very straightforward manner. As 
the central prediction the renormalization and factorization scale are taken 
to be equal to half the transverse energy of the leading jet in the event, \i = 
^E™ ax . To estimate the uncertainty we make another theoretical prediction 
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now choosing as a scale fi = E™ ax . The "one-sigma" uncertainty is defined 
as 

do NLO 1 da NLO 

^(Et) = -j^-(Et,h x ,h= -ET x ) ~ -j^-{E T ^ x ^ = ET X ) ■ (20) 

As we will see later in this section the theoretical uncertainties are small 
compared to the other uncertainties. Therefore this crude estimate suffices 
for the purposes of this paper. In the future a more detailed study of the 
theoretical uncertainty is required. The scale uncertainty is often associated 
with the theoretical uncertainty due to the truncation of the perturbative 
series. However, it is important to realize this is only a part of the full 
theoretical uncertainty. 

In Fig. |l| a we present results for the single inclusive jet cross section as a 
function of the jet transverse energy. Both data and theoretical predictions 
are divided by the average prediction of the initial PDF's. The NLO predic- 
tions are calculated using the JETRAD prediction jlTJ. The inner (outer) 
error bar on the experimental points represent the diagonal part of the ex- 
perimental (total) covariance matrix. The dotted lines represent the initial 
one-sigma PDF uncertainties. The solid lines are the theory predictions cal- 
culated with the new PDF's (i.e., the new probability density distribution). 
The plot is somewhat misleading because of the large point-to-point correla- 
tion of the uncertainties. The confidence level of 50% is very high, indicating 
a good agreement between the prediction and the data. 

This leads us to the conclusion that the one jet inclusive transverse en- 
ergy distribution is statistically in agreement with the NLO theoretical ex- 
pectation based on the initial probability density distribution of the PDF 
parameters. No indication of new physics is present. Note that the predic- 
tion using the initial PDF differs quite a bit from the more traditional fits 
such as MRSDO, see the dashed line in Fig. [l] 2 . Having no uncertainties on 
the traditional fits it is hard to draw any quantitative conclusion from this 
observation. The larger value of the jet cross section calculated using the 
initial PDF set at high transverse energies compared to MRSDO was antici- 
pated in Ref. || and can probably be traced back to the larger d and u quark 
distribution at the reference scale Qo and moderate x ~ 0.2. This difference 
in turn was partially attributed to the different way of treating target mass 
and Fermi motion corrections. 
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Figure 1: (a) Single inclusive jet cross section as a function of the jet trans- 
verse energy. The results are divided by the average prediction calculated 
with the initial PDF's. The data points are the CDF run l a results. The 
dotted lines represent the initial one-sigma PDF uncertainties. The solid 
lines are the theory predictions calculated with the new PDF's. The inner 
(outer) error bars on the data points are the diagonal entries of the experi- 
mental (total) covariance matrix. The dashed line is the prediction obtained 
with the MRSD0 PDF set. (b) The one-sigma correlation contour between 
the strong coupling constant as{Mz) and the /3-parameter in the gluon PDF 
(~ x a {l — x) 13 at the initial factorization scale) calculated for both the initial 
and new PDF's. 
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Given the confidence level of 50% the one jet inclusive data can be in- 
cluded in the fit. Using Eq. [TT| we calculate for each PDF set k the corre- 
sponding Xnew(^ k )- This gives us the 100 weights Wk (conditional probabili- 
ties) defined in Eq. |14[ Using Eq. [L3|, we can calculate the effects of including 
the CDF data into the fit. The results are shown in Figs. |l] a and [If . As can be 
seen in Fig. p] 2 the effect is that the central value is pulled closer to the data 
and the PDF uncertainty is reduced substantially. Two of the fourteen PDF 
parameters are affected the most. As expected these are the strong coupling 
constant as(Mz) and the gluon PDF coefficient /3, which controls the high 
x behavior (the gluon PDF is proportional to x a (l — x) 13 at the initial scale). 
In Fig. |Ip we show the correlation between these two parameters before and 
after the inclusion of the CDF data. As can be seen the impact on f3 is very 
significant. Similarly, the uncertainty on a s is reduced substantially and the 
correlation between the two parameters is also changed. This indicates that 
the one jet inclusive transverse energy distribution in itself has a major im- 
pact on the uncertainty of a s and the determination of the gluon PDF. Note 
that we do not address the issue of the parametrization uncertainty. Other 
choices of how to parameterize the initial PDF's will change the results. To 
obtain a value and uncertainty of as{Mz) which is on the same footing as 
the one obtained from e + e~-colliders, one needs to address this issue. 



3.2 The lepton charge asymmetry measurement 

Our second example is the lepton charge asymmetry in PU-boson decay at 
the Tevatron. As already explained, this observable is important for the 
reduction of the PDF uncertainties in the P^-boson mass extraction at hadron 
colliders. The asymmetry is given by 

A{Ve) -(N+(Ve)+N-( Ve ))> (21) 

where N + and N~ are respectively the number of positrons and electrons at 
the pseudo-rapidity rj e . 

In Fig. ^f 1 , we show the preliminary CDF data of run l b (solid points) for 
the asymmetry, along with the NLO predictions (dotted lines) including the 
PDF uncertainties, relative to the theory average prediction using the initial 
PDF's. For the NLO calculations the DYRAD prediction fl(| was used. The 
inner error bars on the experimental points are the statistical uncertainties; 
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Figure 2: (a) The lepton charge asymmetry as a function of the lepton 
pseudo-rapidity. The results are normalized to the theory prediction using 
the average value of the initial PDF's. The data are the CDF run l b prelim- 
inary results. The error bars, dotted and solid lines have the same definition 
as in Fig. 1. (b) The ratio R(yw) normalized as in (a) as a function of the 
W^-boson rapidity. The dotted and solid lines are defined as in Fig. 1. 
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the systematic uncertainties are small and we can safely neglect them. The 
outer error bars are the diagonal of the total covariance matrix. In this 
case, the theoretical uncertainty is dominated by the phase space Monte 
Carlo integration uncertainty; we took its bin to bin correlation into account. 
Similar to the one jet inclusive transverse energy case, the scale uncertainty is 
defined by the difference between the theoretical prediction calculated using 
two scales, /i = M w and \i = 2 x M w . 

As is clear from Fig. a , there is a good agreement between the data and 
the NLO prediction, except for the last experimental point at the highest 
pseudo-rapidity. The confidence level including the last point is well below 
our threshold of 0.27%. In order to be able to include the data into the PDF 
fit we decided to simply exclude this data point from our analysis. Without 
the highest pseudo-rapidity point we obtain a reasonable confidence level of 
4%. It is not as good as in the single inclusive jet case even though the plots 
appear to indicate otherwise. The reason for this is the absence of significant 
point-to-point correlation for the charge asymmetry uncertainties. 

We can now include the lepton charge asymmetry data into the fit by up- 
dating the probability density distribution with Bayes theorem, as described 
in the previous section. In Fig. ^f 1 the prediction obtained with the new 
probability density distribution are shown by the solid lines. As expected, 
the data are pulling the theory down and reducing the PDF uncertainties. 

It is difficult to correlate the change in the asymmetry to a change in 
a particular PDF parameter. On the other hand, it is well known that the 
lepton asymmetry can be approximately related to the following asymmetry 
of the ratio of up quark (u) and down quark (d) distribution function 

u(xi) 11(3:2) 

*<w) = £L £1 • < 22 > 

d(xi) d(x2) 



M 



x = ^L e ±vw . (23) 



The Bjorken-x are given by 



where Mw is the mass of the W^-boson, y/s the center of mass of the collider, 
and yw the ly-boson rapidity. The PDF's were evaluated with the factor- 
ization scale equal to Mw- The ratio R(yw) is approximately the IV-boson 
asymmetry and obviously is sensitive to the slope of the u/d ratio. 
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In Fig. |2p we show the ratio R(yw) calculated with both the initial and 
the new probability density distributions. As can be seen, the change is very 
similar to the change experienced by the lepton charge asymmetry itself. 
The change in R(yw) can be traced to a simultaneous increase in the anti-up 
quark distribution and decrease in the anti-down quark distribution at low 
x. 

4 Conclusions ant Outlook 

Current standard sets of PDF do not include uncertainties. It is clear that 
we can not continue to discount them. Already current measurements at the 
Tevatron have highlighted the importance of these uncertainties for the search 
of physics beyond the Standard Model. Furthermore, the potential of future 
hadron colliders to measure a s (Mz) and the P^-boson mass is impressive, but 
can not be disentangled from PDF uncertainties. The physics at the LHC 
will also undoubtedly require a good understanding of the PDF uncertainties. 
On a more general level, if we want to quantitatively test the framework of 
perturbative QCD over a very large range of parton collision energies the 
issue of PDF uncertainties can not be sidestepped. 

In this paper we have illustrated a method, based on statistical inference, 
that can be used to easily propagate uncertainties to new observables, assess 
the compatibility of new data, and if the latter is good to include the effect of 
the new data on the PDF without having to redo the whole fit. The method 
is versatile and modular: an experiment can be included in or excluded from 
the new PDF fit without any additional work. The statistical and systematic 
uncertainties with the full point-to-point correlation matrix can be included 
as well as the theoretical uncertainties. None of the uncertainties are required 
to be Gaussian distributed. 

One remaining problem is the uncertainty associated with the choice of 
parametrization of the input PDF. This is a difficult problem that does not 
have a clear answer yet and will require a compromise between the number of 
parameters and the smoothness of the PDF. We plan to address this question 
in another paper. The next phase would then be to obtain a large number 
of initial PDF's sets based on theoretical consideration only, in the spirit of 
the inference method and Bayes theorem. The DIS and Tevatron data could 
then be used to constraint the range of these PDF's resulting in a set of 
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PDF's which would include both experimental and theoretical uncertainties. 



A Appendix: Input PDF 

For our initial PDF parameter probability density distribution we use the 
results of Ref . || . There a chi-squared fit was performed to DIS data from 
BCDMS 0, NMC g|], HI @ and ZEUS @. Both statistical uncertain- 
ties and experimental systematic uncertainties with point-to-point correla- 
tions were included in the fit, assuming Gaussian distributions. However, no 
theoretical uncertainties were considered. It is important to include the cor- 
relation of the systematic uncertainties because the latter usually dominate 
in DIS data. Simply adding them in quadrature to the statistical uncertainty 
would result in a overestimation of the uncertainty. 

A standard parametrization at Qq = 9 GeV 2 is used with 14 (=N par ) 
parameters: xd v , xg, xd, xu, and xs are parametrized using the functional 
form x Xi ( 1 — x) X] whereas xu v is parametrized as x Xi (l — x) Aj ( 1 + A^x) . Here x 
is the Bjorken-x. Parton number and momentum conservation constraints are 
imposed. The full covariance matrix of the parameters, C tnit , is extracted at 
the same time as the value of the parameters that minimize the chi-squared. 
The uncertainties on the parameters were assumed to be Gaussian, such that 
the fitted values also correspond to the average values of the parameters, fix. • 
The probability density distribution is then given by 



P imt {\) = 6 , (24) 

(2ir) N v ar \C init \ 



where 

xLt(A) = E (A, - VxMfi^ - , (25) 

is the difference between the total chi-squared of the experimental data used 
in the fit and the minimum chi-squared (1256 for 1061 data points) with the 
PDF's fixed by the set of parameters {A}. The matrix M tntt is the inverse of 
the covariance matrix C mit . The |C m? *| is the determinant of the covariance 
matrix. All the calculations were done in the MS-scheme. 

Comparison with MRS and CTEQ sets showed a good overall agreement 
with a few exceptions. One example is the difference in the gluon distribution 
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function at large values of x. The CTEQ and MRS distribution are somewhat 
above the result of Ref. ||. This difference was attributed to the fact that 
prompt photon data were included in the CTEQ and MRS fits. Note that the 
direct photon data have large scale uncertainty, and it might be misleading to 
include them in a fit without taking the theoretical uncertainty into account. 
Also, it is important to keep in mind that it is misleading to compare specific 
PDF's, as the correlations between different PDF parameters are large. 
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